Operator Language: A Program Generation Framework for Fast Kernels
نویسندگان
چکیده
We present the Operator Language (OL), a framework to automatically generate fast numerical kernels. OL provides the structure to extend the program generation system Spiral beyond the transform domain. Using OL, we show how to automatically generate library functionality for the fast Fourier transform and multiple non-transform kernels, including matrix-matrix multiplication, synthetic aperture radar (SAR), circular convolution, sorting networks, and Viterbi decoding. The control flow of the kernels is data-independent, which allows us to cast their algorithms as operator expressions. Using rewriting systems, a structural architecture model and empirical search, we automatically generate very fast C implementations for state-of-the-art multicore CPUs that rival hand-tuned implementations.
منابع مشابه
Compiling Stream Kernels for Polymorphous Computing Architectures
Polymorphous Computing Architectures (PCA) have multiple modes of operation and can reassign resources allocated to these modes during program execution. Such architectures enable a single computational fabric to meet the diverse computing needs of complex applications that previously required multiple, distinct HW/SW solutions integrated into a system solution. The MONARCH chip is a PCA capabl...
متن کاملFFT Program Generation for the Cell BE
The complexity of the Cell BE’s architecture makes it difficult and time consuming to develop multithreaded, vectorized, high-performance numerical libraries. Our approach to solving this problem is to use Spiral, a program generation system, to automatically generate and optimize linear transform libraries for the Cell. To extend the Spiral framework to support the Cell architecture, we first ...
متن کاملAutomatic Generation of Sparse Tensor Kernels with Workspaces
Recent advances in compiler theory describe how to compile sparse tensor algebra. Prior work, however, does not describe how to generate efficient code that takes advantage of temporary workspaces. These are often used to hand-optimize important kernels such as sparse matrix multiplication and the matricized tensor times Khatri-Rao product. Without this capability, compilers and code generators...
متن کاملAdaptive Dynamic Scheduling of Fft on Hierarchical Memory and Multi - Core Architectures
In this dissertation, we present a framework for expressing, evaluating and executing dynamic schedules for FFT computation on hierarchical and shared memory multiprocessor / multi-core architectures. The framework employs a two layered optimization methodology to adapt the FFT computation to a given architecture and dataset. At installation time, the code generator adapts to the microprocessor...
متن کاملMulti Objective Scheduling of Utility-scale Energy Storages and Demand Response Programs Portfolio for Grid Integration of Wind Power
Increasing the penetration of variable wind generation in power systems has created some new challenges in the power system operation. In such a situation, the inclusion of flexible resources which have the potential of facilitating wind power integration is necessary. Demand response (DR) programs and emerging utility-scale energy storages (ESs) are known as two powerful flexible tools that ca...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009